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Abstract 



< 

w . The quasi-optimality criterion chooses the regularization parameter in 

inverse problems without taking into account the noise level. This rule 
works remarkably well in practice, although Bakushinskii has shown that 
there are always counterexamples with very poor performance. We propose 
an average case analysis of quasi-optimality for spectral cut-off estimators 
' and we prove that the quasi-optimality criterion determines estimators 

, which are rate-optimal on average. Its practical performance is illustrated 

with a calibration problem from mathematical finance. 

O ■ 

1 Introduction 

We consider the prototype of a linear inverse problem where we observe y = 
Ax+^ with a compact operator iona Hilbert space X and some noise variable 
£ and where we try to recover a stable approximation of the true solution x £ X. 
In this setting we address the question, why the so-called quasi-optimality crite- 
rion for choosing the regularization parameter in stable inversion algorithms, as 
proposed by Tikhonov and Arsenin (1977) and Tikhonov, Glasko, and Kriksin 
(1979), works remarkably well in many practical situations. 

The classical quasi-optimality criterion is applied to the regularized solutions 

x a = (A* A + aiy 1 A*y, a > 0, 

based on Tikhonov's method and proposes to choose a > such that 



da 

Reparametrizing with a = q n for q E (0, 1) and n£l yields — ► min n ! 
When for practical purposes only a discrete grid {q n \ n G IN} is considered, 
then this reduces to the following criterion 

\\x q n — x n+i\\ — > min! 
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Obviously this parameter choice rule does not rely on any knowledge concerning 
the operator, the solution and the noise variable. 

On the other hand, Bakushinskii (1984) has shown for deterministic noise £ 
that a method for choosing the regularization parameter should depend on the 
noise level, whereas quasi-optimality does not. 

Theorem 1.1. // the regularized solution operator R : X — ► X does not depend 
explicitly on the noise level 5, then for any (infinite rank) compact operator 
A : X — > X there is a y £ Dom(A + ) for which 

lim sup \\Ry s -A + y\\ > 0, 

\\y-ys\\<$ 

where A + denotes the generalized inverse of A. 

The core message of our paper is that while Bakushinskii's result is true for a 
worst case analysis, that is for any such method R certain counterexamples can 
be constructed, it is not necessarily true when we assess a method by its average 
case performance. In particular, quasi-optimality works well on average. Our 
analysis applies also to variants of quasi-optimality like Hardened Balancing 
(Bauer 2007), which are sometimes preferable in practice. 

For a general overview about practical inverse problems, their abstract 
mathematical formulation and methods for choosing the regularization param- 
eter we refer to the monograph Engl, Hanke, and Neubauer (1996), while the 
Bayesian framework, which is related to our approach, is discussed by Kaipio 
and Somersalo (2005). The choice of the regularization parameter is, of course, 
a perennial problem in nonparametric statistics, see e.g. Wasserman (2006), and 
diverse methods have been applied to statistical inverse problems, e.g. general- 
ized cross validation (Wahba 1977), Lepski's method (Lepski 1990, Bauer and 
Pereverzev 2005) or wavelet thresholding (Cohen, Hoffmann, and Reifi 2004), 
which depend, however, heavily on the knowledge of the noise level. The easy to 
use and heuristically motivated quasi-optimality criterion and its variants are 
attractive in practice because they do not rely on the knowledge of the noise 
level. 

First considerations, why this kind of methods might work, are already given 
by Bauer (2007). In Section [3] below, we derive the proper mathematical result 
that estimators, based on a spectral cut-off scheme and the quasi-optimality 
criterion for selecting the cut-off value, are rate-optimal on average. Specifying 
an average case scenario amounts to prescribing a Bayesian a priori law for the 
functions to be estimated. We suppose that the coefficients of the solution in 
the singular value decomposition are normally distributed around zero. Only a 
very general condition on the decay property of the variances will be imposed for 
the proof of the theorem. The precise setting is described in Section [21 which is 
followed by the mathematical analysis in Section [3l The proofs are postponed 
to Section [6j The numerical example in Section [J] treats the calibration an 
option price model from mathematical finance. In this typical inverse problem 
in financial engineering the choice of the regularization parameter is extremely 
difficult and the proposed methods yield comparatively good results. A short 
outlook in Section [5] concludes. 
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2 The framework 



2.1 Observations and estimators 

We consider A : X — > X, a compact, self-adjoint and positive-definite operator 
on the real Hilbert space X with singular value decomposition 

oo oo 

Ax = S ^X{k){x,u k )u k = S ^X(k)x k u k , (1) 

k=l fc=l 

where is an orthonormal basis of eigenvectors and the positive eigenvalues 
(X(k)) k are arranged in decreasing order, satisfying lim^oo X(k) = 0. From the 
general observation model 

y = Ax + £, £ noise, 

we immediately go over to a sequence space model by considering the coordi- 
nates with respect to (u k ). We observe 

y k = X(k)x k + e(k)£k, k>l. 

Here, £ k are i.i.d. standard normal random variables and (x k ) the coordinates of 
the unknown quantity x, which is to be estimated. Writing tr(fc) := e(k)/X(k), 
the empirical coefficients of x are given by 

x k := X(k)~ l y k = x k + e(k)X(k)~ l i k = x k + a{k)£ k , k>l. (2) 

Equation ([2]) is the abstract observation model we shall consider from now on. 
We shall use a subsampling function I : IN" — ► M with l(n + 1) > £(n). Applying 
a spectral cut-off scheme, our estimator of x at level n for a given subsampling 
£ is defined as 

t(n) 
fc=l 

Its mean squared error (MSE) is given by the bias-variance decomposition, see 
e.g. Wasserman (2006), Engl, Hanke, and Neubauer (1996): 

i(n) oo 

E[||fM - x[| 2 ] = $>(*) a + x l 

k=l fc=£(n)+l 

Recall that the bias- variance dilemma is the fact that the variance, i.e. the first 
sum, increases with n, while the squared bias, i.e. the second sum, decreases. 
The value of n where the total sum is minimal depends on bias and variance 
and thus on the properties of the unknown x and of the noise level cr(-). The 
quasi-optimality criterion gives a data-driven choice for n. 
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2.2 Assumptions 

Let us now adopt a Bayesian point of view and perform an average case analysis. 
We weight the coefficients of x by the prior distribution 

Xk N(0,~/(k) 2 ), k>l. 

Assumption 12.11 below will implicitly require certain decay properties of the 
variances 7(/c) 2 for k — > oo, but we need not know them in detail. Writing E 
for the joint expectation with respect to (£(&)) arm (%k), the Bayesian risk R2 
for the MSE is given by 

t{n) 00 

R 2 (x^f := E[||x (n) - x|| 2 ] = °( k ? + Yj 7 ^ 2 ' 

k=l k=£(n)+l 

Introducing 

i{n) 

s{n) := Y^ a (k) 2 (variance), 

k=l 

00 

b{n) := Yj l(k) 2 (mean squared bias) 

k=£(n)+l 

yields the risk decomposition R2(x^) 2 = s(n) + b(n). 

Assumption 2.1. (Geometric growth, decay rates) We assume that with some 
1 < c s < C s , 1 < q, < Cj, we have for all n > 1 

c s s(n) < s(n + 1) < C s s(n), c b b(n + 1) < b(n) < C b b(n + 1). 

This assumption can be easily fulfilled for most moderately ill-posed prob- 
lems with Holder source conditions (Engl, Hanke, and Neubauer 1996) by choos- 
ing an exponential subsampling function I. 

Example 2.2. Assume that we have a moderately ill-posed problem with Holder 
source conditions (i.e. X(k) x k~ u , v > 0, and ^(k) x k~ ^ , n> 1/2) and white 
noise of level 5 > (i.e. e(k) = 6, and hence o~(k) = SX(k)^ 1 >c bk v ). This 
white noise setting is assumed for simplicity, but is not strictly necessary. 

Then choosing an exponential subsampling like £(n) = £(0)h n with £(0) £ IN, 
h > 1 independent of the noise level and the source conditions, we obtain 

i(n) e(o)h n 
S (n) = 5> W 2 x £ ^x«^(^r 

k=l k=l 

oo oo 

b (n)= £ 7 ^ 2 x £ fc- 2 ^ e -%^(h-*^r 

k=e(n)+l k=e(0)h™+l 

and hence s(n) and b(n) fulfill Assumption \2.1\ 
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Right now we will introduce a weight function \i which will allow to treat 
also variants of quasi-optimality, cf. Remark 12.61 below. 

Assumption 2.3. (weight function) Let x '■ IN — > R be a weight function which 
satisfies for some constants c x , C x with c^ 1 < c x < C x < c s : 

c x x(n + 1) < x(n) < C x x(n + 1). 

Example 2.4. For x{~) = 1 Assumption \2.3\ is obviously fulfilled because of 
A s sumption \ 2. 1[ Using that s(n) is increasing, Assumption \2.3\ also holds for 
x(n) = s(n) -1 / 2 . 

2.3 Choosing the regularization parameter 

Definition 2.5. Given a weight function x> we choose the cut-off level in a 
data-driven way according to the minimum distance or quasi-optimality crite- 
rion (Tikhonov and Arsenin 1977, Tikhonov, Glasko, and Kriksin 1979): 

n* := argmin{x(n)||x (ri+1) -f (n) || 2 ). 

n>l •> > 

Remark 2.6. For %(•) = 1 this is a discretized version of the quasi-optimality 
criterion; for x{ n ) = sin)^ 1 ^ 2 a version of the hardened balancing principle 
(Bauer 2007). In Theorem \ 3.4\ it is proved that the infimum of the criterion 
over n is almost surely attained and thus n* is well defined. It is unique when 
we take as minimizer the smallest index n where the minimum is attained. 

Obviously, we do not need at any point an explicit or implicit knowledge of 
the noise level for computing ||x^ n+1 ^ — x^ n ^||. 

The question remains how to minimize the criterion numerically. First of 
all, we have in practice an idea about a lower bound for the noise (e.g. the 
machine precision). Furthermore, in practical applications we only have a finite 
number of observations and we shall never deal with more coefficients in the 
sequence space model than observations available. 

2.4 Intuition for the proof 

Let us consider the case x(") = 1- At a first heuristic level the quasi-optimality 
criterion is plausible because by the geometric decay and growth in Assumption 

E[||x( n )-x(™ +1 )|| 2 ] = (a(n+l)-s(n))+(6(n)-6(n+l)) x s(n)+b(n) = R 2 (^ n) ) 2 . 

Hence, minimizing the criterion is related to minimizing the average case risk. 
Note, however, that a careful analysis is needed since the criterion is random 
and not at all independent of the estimator. 

More generally, Definition 12.51 can be understood as a search for the in- 
tersection point of the decreasing function &(■) and the increasing function 
s(-), where the minimal risk (with respect to n) is attained up to some constant 
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n# n* 



Figure 1: Idea of Proof 

factor, see Figure [T] and Lemma l3.2l b elow . Usually, comparably much is known 
about the variance part s(-) and not so much about the bias part b(-). 

Let us briefly explain in words why the data-driven choice n* will be close 
to the intersection point and therefore the resulting error in estimating x 
does not change in order. As we face a symmetric situation (by exchanging the 
role of s(-) and &(•)), we consider n* on the right side of (n* > nr). 

In expectation (the situation depicted in Figure [I]) the error curves meet at 
the oracle value n*. Here, though, we need to compute the probability that 
another point n* yields the minimal point for [|x( n+1 ) — x^ n ^||. 

To bound this probability, we introduce a threshold; the probability that 
|| > threshold(n*) or ||£( n * +1 ) - f( n *) || < threshold(n*) is larger 
than the desired probability for ||x^ n#+1 - ) — > ||x ( - n * +1 ) — but much 

easier computable. 

Furthermore, we completely ignore the fact we can just have one minimum, 
we replace E[||a: — || 2 ] by the sum of all positions where n* beats weighted 
by the probability of their occurrence. Even for this very rough estimation 
this sum is still bounded from above by the order of the oracle. The formal 
mathematical statements are formulated along these lines in the next section. 

3 The mathematical analysis 

In this section we first determine a rate-optimal estimator for different moment- 
type losses, based on the (unrealistic) knowledge of the index where the 
curves &(•) and s(-) intersect. Then we show that this rate does not deteriorate 
when we take the estimator with the data-driven index n*, based on the quasi- 
optimality criterion of Definition 12.51 For the sake of readability all proofs are 
deferred to Section [H 
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3.1 Risk for different moments 

We denote the normalized Bayes risk for the moments of order a > by 

R a {x {n) ) :=E[\\x in) -x\\ a ] 1/a , 

consistent with the definition of R2(x^). Note the order i? a (i( ra )) < Rp(x^) 
for a < 0. 

Definition 3.1. Let be the index where s(n* + 1) > 6(n* + 1) and s(n*) < 
b(n#). 

Note that is uniquely defined by the monotonicity properties of s(-) and 
&(•). Interestingly, for the risk is up to a multiplicative factor minimal, even 
under different moments: 

Lemma 3.2. Denote by T the Gamma function and by Q a standard normal 
random variable. With 

K a := ^/2C b {AT{^ + l))V« e -E[log(|<|)] 
the following bounds hold for all a > 0: 

Vn G IN : R a {x^) > e ^ l ° s( -^ R 2 (x {n) ), 

Vn G IN : R a (x^) < (4r(f + l)) l / a R 2 {&^), 

mini? a (x (n) ) < R a {x {n * ] ) < K a mmR a (x^). 

n n 

We conclude from the last property that the risk (the expectation of the 
error) at the intersection point of s(-) and &(•) is bounded by a constant multiple 
of the minimal possible risk. The quasi-optimality criterion aims at recovering 
this index 

3.2 Main result 

The first proposition quantifies the probability P(n* = n). 
Proposition 3.3. Grant Assumptions \2. 1\ and \2.3[ Then 

P(n* =n)< (v / 2 + {2er(n)) r( - n ^ 2 ){p(n)log(p(n)- 1 )) r ^/ 2 , 

where 

( s ._ \x{n) ((s(n + 1) - s(n)) + (b{n) - b{n + 1)))| 
r{n) .— 



max^ (n)<fe < £(n+1) (o-(£;) 2 + j(k) 2 ) 
and 

' g ^ 1 (c s C- 1 )-\ n - n# \, forn>n* 



pin) 



^^i(c fe c x )-l"-" # l, forn<n#. 



Since p{n) decays exponentially fast, this result yields a rapid decay for the 
probability that the minimal value for the quasi-optimality criterion is obtained 
away from the index n*. The slope of the exponential is largely depending 
on the subsampling function used, which will be investigated further in the 
upcoming example. We are now prepared to state and prove our main theorem. 
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Theorem 3.4. Grant Assumptions \2. i\ and \2.3\ and set r := inf n r(n). Assume 
that a > satisfies 

-\og(c b c x ) \og(c s /C x ) 



a < rmm , 

log(Cfe) log(C s 

Then our estimator is almost surely well defined, i.e. the minimum in Definition 
\2.5\ for n* is indeed attained, and it satisfies the oracle-type inequality 

Ra{x {nt) ) < KmmR a (x^) 

n 

with a constant K = K(c s ,Cb, C s , C&, c x , C x ,r, a) > 0. 

Remark 3.5. A closer look at the proof of Proposition HOI reveals that for this 
result only the boundedness and the decay behaviour of the Gaussian density 
was used. Hence, a similar result will hold for more general noise densities 
with exponential decay. 

This oracle-type result, see e.g. Wasserman (2006) for a general discussion, 
is actually better than the classical statement obtained in inverse problems. 
There the results are given in the form (Engl, Hanke, and Neubauer 1996) 

Wx-x^W < c5 T , 

where r is dependent on the type of noise and the source conditions. When x 
or the noise have "accidentally" better properties than expected we still have 
this bound although min n \\x — x^\\ <C c5 T for 5 < 5q. 

In our case of an oracle inequality this cannot happen because we always 
compare with the best possible ("oracle") solution. So we still have the theoret- 
ical upper bound in the form c5 T but can additionally claim, that (in average) 
we are very close to the best possible solution, even if it is much better than 
any specific rate derived. 

Although the mathematical result depends on a number of constants, it is 
important to note that the existence of these constants is required to derive 
the mathematical result, their actual values are not at all used for the quasi- 
optimality criterion. 

It is interesting to note that though we face a stochastic noise we do not 
lose a logarithmic factor in the data-driven method as is sometimes the case for 
other parameter choice methods (Cohen, Hoffmann, and Reifi 2004, Bauer and 
Pereverzev 2005). 

The general picture is that our data-driven choice n* of the cut-off value 
yields an optimal risk bound up to constants whenever the moments taken 
are not so large. In order to achieve higher moments, we need to choose a 
subsampling function £(n) that makes r large enough. In any case, we have 
r > 1 by definition. 

Example 3.6. For the standard case of quadratic risk we need a = 2. For very 
tight exponential bounds C s « c s , C& c& and x(-) = 1 it suffices to have r > 2. 
Now we can reconsider Example 1 2. 21 where 
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Hence we have 



r(n) f» 



mm < 



2u+l 




£ (Q)-(2 M -1) 



^-2 M +l)n+1^2 M 



-1 



1) 



> 



^(O)/^ 1 ) 



=£(0) min 



2u+l 



1 




and so in the worst case 



2 < r < r(l) « £(0)h(h - 1) = £(2) - £(1). 



77ms i/ie /zrsi (and smallest) distance 1(2) — 1(1) needs to be at least 3 in this 
case. This finding corresponds very well with numerical experience. 

4 Application 

In Bauer (2007) the quasi-optimality, Lepski balancing and hardened balanc- 
ing principles have been compared numerically. The findings are that both, 
for a large scale stochastic experiment inverting ill-conditioned matrices and 
for a more realistic experiment determining the field of gravity from satellite 
data, quasi-optimality and hardened balancing perform quite well and stably, 
in particular better than the Lepski balancing principle. 

The advantage of quasi-optimality is that it can cope with a rather unknown 
structure of the noise. Therefore we will present in the sequel numerical exper- 
iments for an inverse problem arising in option pricing where noise enters from 
various sources and its level is not easy to estimate. 

The calibration of financial models based on option prices has attracted 
increasing attention recently due to its practical importance and mathematical 
challenges, see e.g. Crepey (2003) and Egger, Hein, and Hofmann (2006) and 
the references therein for the case of a generalized Black-Scholes model and 
Chapter 13 in Cont and Tankov (2004) for jump models as considered here. 

4.1 The calibration problem 

We consider the problem of calibrating an exponential Levy model based on 
market prices of European options and closely follow Belomestny and Reifi 
(2006). We briefly describe the problem, but refer to Cont and Tankov (2004) 
for a thorough introduction. It is assumed that we observe prices C(Kj) of Eu- 
ropean call options with different strike prices Kj, j = 1, . . . , n, and same time 
T > to maturity and that the underlying stock price follows an exponential 
Levy process 



Here, the Levy process is restricted to be a superposition of a Brownian motion 
of volatility a 2 > 0, a linear drift of slope 7 6 K, and a compound Poisson 
jump process of intensity A > with jump density v : R — ► R + . The goal 
is to estimate these model parameters, in particular the jump density, which 



St = Soe 4 with a Levy process (L t ). 



9 



because of only finitely many observations and the presence of bid-ask spreads 
is a typical inverse problem with noisy observations in quantitative finance. The 
knowledge of these parameters permits to get a clear picture of the expectations 
at the market concerning future jumps in the stock price, which is essential for 
well-founded risk management and pricing of path-dependent options. 

We transform the strike prices (Kj) to the so called log-forward money- 
ness (xj) and the call option prices C(Kj) to a better behaved generalized op- 
tion price function O(xj) and we introduce the weighted jump density fj,(x) = 
e x v(x). Then the forward formula, expressing option prices in terms of the 
model parameters, is given in the spectral domain by (J- denotes the Fourier 
transform) 

jr C{v) _ 1 ~ eMT(-o- 2 (v ~ i) 2 /2 + ij(v - i) + T^jv) - A)) ^ 

v(v — i) 

cf. Equation (2.7) in Belomestny and Reifi (2006). Our observations are mod- 
eled as 

Oj = O(xj) +ej, j = l,...,n, 

with i.i.d. and centered noise variables (sj). For the sake of an easier presen- 
tation here, we assume that the real parameters (c 2 ,7, A) are known such that 
the backward formula, expressing the transformed jump density in terms of the 
option prices, is given by 

Ffi(y) = T- 1 log(l - v(v - i)TO{v)) +a 2 (v- if/2 - ij(v - i) + A, v e R . 

We construct an empirical version O from the observations (Oj) (e.g. by linear 
interpolation) and obtain by substitution in this formula an empirical version 
J~fi{v) of J-fi(y), which satisfies for small noise levels 

l-v(v- i)FO(v)\ _ , v(v - i)T{6 - 0)(v) 



,,W W M 1 , /I — V(V — l)J-U(V)\ 



l-v(v- i)FO{v) J l-v(v- i)TO{v) 

This first order analysis already reveals the ill-posedness of the calibration prob- 
lem: the higher the frequency |u|, the more the error T{0 — 0)(v) in the obser- 
vation domain is amplified by the factor v(v — i) as well as by the denominator 
which tends to zero for \v\ — > oo. The nonlinearity is reflected by a noise level 
which depends on the unknown true value J-(D(v). A natural approach is to 
cut-off high frequencies and to consider for U > the estimators 

fiuix) = J r-1 (J r Atl[_ C / )£ /])(a:), x G R. 

Let us write abstractly 

Fji(v) = Tn{v) + a(y)£(y), v e R, 

with o~(v) denoting the noise level and £(v) denoting the normalized noise vari- 
able, that is the difference between empirical and true value divided by its 
standard deviation, at frequency v. Now we see the analogy with the sequence 
space model ([2]) analyzed before. In fact, the only difference is the continuity 
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Simulation 



Optimal Solution 




Figure 2: left: Data, right: optimal solution 

of the spectral parameter v instead of k £ M and the additional difficulty is the 
much more complicated noise structure. Nevertheless, the estimators ~pu are 
provably rate-optimal for the right choice of the cut-off frequency U (the prob- 
lem is severely ill-posed in general). In simulations the standard data-driven 
choices of U (e.g. Lepski's method, cross validation) had a poor performance 
compared to the oracle choice, mostly because of a very badly known noise 
structure. This is exactly why the quasi-optimality criterion is of interest here. 

4.2 Experimental Setup 

In total we performed 1000 independent experiments. Each of them was set up 
as follows (notation as in Belomestny and Reifi (2006) where more details can 
be found): 

• 100 design points (xj) were chosen at random, 50 according to a standard 
normal, 50 according to a uniform distribution on [—4,8]. 

• Corresponding observations (Oi) are generated by calculating the exact 
value and adding standard normal noise of 3% relative noise level, i.e. 
Oi = 0{xi){l + 0.03ej) with e, ~ JV(0, 1). The other characteristics are 
chosen as follows (corresponding to the so-called Merton model): volatility 
a = 0.1, jumps are standard normal with intensity A = 5 (the jump 
density v in Figure [2] (right)), maturity T = 0.25. The value of 7 is 
obtained by imposing a martingale condition. In Figure [2] (left) the true 
option price function 0{ ) is shown together with the observations as a 
function of x. Abstractly, one can show that in this setting both, the bias 
and the variance term have exponential decay respectively growth in U. 



In total 60 cut-off frequencies (U n ) were chosen with step-width 0.8, i.e. 
U n = 0.8n, n = 1,...,60. 



4.3 Parameter choice methods 

We will compare three different parameter choice methods. The quasi-optimality 
and the Hardened Balancing Principle are treated in this article and Lepski's 
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method serves as a widely used benchmark. 
Quasi-Optimality. 

We use exactly as in Definition 12.51 with %(•) = 1 

n qo := argmin{||x (n+1) -x (n) ||}. 

n 

Lepski-type method. 

Define 

f(n) := max \ 4~ 1 \\x^ — x^\\/s(m) > , niepski '■= argmin{Vm > n : (m) < 

n<m<N I J n 

For the choice of n one should theoretically use a quantity depending on the 
noise level and larger than 1. In practice, however, it is observed that k £ 
[0.25,0.75] gives superior results and works in more or less any situation. Here 
we choose k = 0.75. 

Hardened Balancing Principle. 

We reuse the computed quantity f(n) defined above and choose as regulariza- 
tion parameter 

riHBP ■= argmin{/ (n)y/s(n)} 

n 

Due to the definition of / it can be seen as a stabilized version of Definition 
12.51 with x( n ) = s(^) 1 ^ 2 - This stabilization mainly counters the effects of a 
subsampling with inappropriately small spacing between the cut-off points. 

Implementation details. 

There are certain fine points in the implementation, which are not crucial, but 
yield slightly superior results. The important point for the methods proposed 
here is the validity of Assumption 12.11 Therefore we estimated s(-) out of 10 
independent data sets (each of them constructed as given above, i.e. each of 
them with different design points and random error) and removed the cut-off 
parameters n S {17, 27, 36, 44, 46, 48} which seem to violate the assumption 
empirically. 

The Lepski balancing principle and Hardened Balancing show undesirable 
properties when s(max)/s(n) < 2. Therefore we restricted the region of ad- 
missible regularization parameters further to n £ {1, . . . 43} \ {17, 27, 36}. The 
remaining cut-off points were still used to calculate the values f(n). 

4.4 Experiment 

In Figure [3] the distribution of the efficiencies is displayed, i.e. the ratio of 
errors between the unsupervised solution selected by one of the methods and 
the optimal one, i.e. 

efficiency(x, y, n chosen ) := ||x( nc ' lose ") - x\\/ min \\x^ - x\\. 
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jQuasi-Optimality 
I Lepski Balancing 
I Hardened Balancing 




■K - 



1.00 1.05 1.14 1.30 1.57 2.08 3.16 5.94 15.31 64.0 huge 
Mean: 2.1 0/6,87/ 2,64 Median: 1,39/4,33/1,61 

Figure 3: Efficiencies 



Note that the ratio cannot be smaller than 1. Remark further that it is much 
harder to obtain a low value for E efficiency (x, y, n c h osen ) compared to an oracle 
criterion of the form E||:r( nc ' iose "* ) — x\\/ min n E||x^ n ^ — x|| because already one 
much better estimator in the denominator can spoil the results in our case. The 
bar plot of Figure [3] bins the solutions according to their efficiency. The bins 
are taken on a double exponential scale, ratios larger than 64 were considered 
as huge. 

As we can see, quasi-optimality and hardened balancing, as defined here, 
are superior to the Lepski-balancing principle, which suffers from a consider- 
able number of bad results. Although in this situation the hardened balancing 
principle performs slightly worse, it has a particular advantage. Generally, it is 
relatively insensitive towards a bad choice of the subsampling whereas already 
one badly chosen distance in the subsampling could bring quasi-optimality out 
of track. Yet, it should be recalled that it relies significantly on an estimate of 
the stochastic noise level s(-), obtained from additional data sets. 



5 Perspectives 

Numerical experiments indicate that the results can be carried over to other 
regularization methods like Tikhonov regularization. Unfortunately, the cor- 
relation structure induced by Tikhonov regularization makes an analysis in 
comparison to the spectral cut-off regularization much more demanding and 
remains a topic of future research. 

Interestingly, a way to improve the numerical results for spectral cut-off even 
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further is to use a two-step procedure, where first a regularization parameter is 
obtained from a very coarse subsampling and then the region around the chosen 
regularization parameter is reconsidered with a finer subsampling. These types 
of procedures work provably well for change point detection problems (e.g. 
Korostelev (1987)), a setting, which is to some extent related with our search 
for the intersection point of &(•) and s(-). 



6 Proofs 

In order to evaluate the probabilities of staying above or below the threshold, 
we will make use of the following technical result. 

Lemma 6.1. Let Z = Ylk=l a k^k w ^ YlkLi a \ = 1 an d Cfc ~ -W(0, 1) iid. 
Then 

Vz G (0,1) : P(Z < z) < expf 1 ~ Z + 1 ° g 9 ^ y Vz > : P(Z > z) < V2e~ z / 4 . 

\ 2 maxfc af, I 

Proof. For all A > and z G (0, 1) we have 

oo 

P(Z < z) = P (exp ( - A(^4C|)) > e" 

fc=i 

oo 

<e Xz HE[eM-^lC 2 k )} 

k=l 

oo 

= e ^n( 1 + 2A «lr 1/2 - 

fc=l 

-1/2 ^ / log(l+e) 
/ < pyn si Li 



Xz 



Due to (1 + 2X)- 1 ' 1 < exp(- m6ll+£J s) for x G [1, 1 + e], e > 0, we deduce for 

4) 



A = e(2 maxfc a?) 1 and e = z 1 — 1: 



e ( log(l / 1 - z + log(z) 



H z <z)< exp y 2 — = exp , - 

V 2 max/c V e / / V 2 max/% a£ 

By Jensen's inequality we obtain for z > 



(Z > z) < e-/ 4 E exp ( ]T a 2 k ( 2 k /A) < e"^ 4 E[exp(C 2 /4)] = V2e 



-z/4 



k=l 



(3) 
□ 

of Lemma [37B . For each n > 1 and a > we derive from the concavity of the 
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log-function and Jensen's inequality 
R a (x {n) f =t[\\x^ -x\\ a ] 2,a 



> exp (E[log( 



x 



exp (E 



(n) _ 



+ E 



s(n)+f>(n) ^ s(n)+b(n) 
k=l k=£(n)+l 



+ log(s(n) + b(n)) 



e{n) oo 
k=l k=e(n)+l 

= (s(n) + b(n))e E t l0 ^ = e^°^R 2 (x^) 2 , 

which gives the first inequality. For the second, we use w a ^ 2 /T(^ + 1) < e w 
for w > 0, Lemma 16. II and (1 — w) -1 / 2 < 2 W for to G (0, 1/2] to obtain for any 
a > and A G (0, min( 7 (£(n) + l)" 2 , a(£(n))- 2 )/4): 

E[||fW - x|| a ] = T(f + l)A- Q / 2 E[A Q / 2 ||x^ - x|| Q /r(f + 1)] 

< T(f + l)A- a/2 E[exp(A||x (n) - x\\ 2 )} 

On) oo 

= T(f + l)A" a/2 J](l-2Aa(A:) 2 )- 1 / 2 Yl (1 - 2A 7 (A:) 2 )- 1 / 2 

fc=l k=E(n)+l 

< r(f + l)A~ a/2 exp(2 log(2)A(s(n) + 6(n))). 

The choice A = (s(n) + 6(n)) _1 fulfills the above requirement and thus yields 

R a (x M ) = E[\\x^ - xH ] 1 /" < (4T(f + I)) 1/Q ii2(£ (n) ). 

The last inequality follows from the two others together with the fact that 
Assumption 12.11 implies 

mini? 2 (x (ri) ) 2 >min(6(n # ),s(n # + l)) > ^R 2 ( x ^) 2 . 



□ 



of Proposition [Ql Let us first introduce the following quantities: 

o(n) 



D{n) 
Z(n) 



Vxtn)Vi4n + 1) - s(n)) + (b(n) - b(n + 1)), 
X(n)||f ( 



_ ^(«)|| 2 



VflR/o(n). 

We shall only consider n > because the case n < n& is symmetric (exchang- 
ing s and b) and the case n = ri& is trivial. For m := n — ri^ > 1 we find by 
Assumptions 12.11 and 12.31 

a(n#) 2 < X (n*) {C s - l)s(n#) + (C b - l)b{n* + 1) 



a(n# + m) 2 x( n * + 771 ) ( c s — l)s(n# + m) 

(C fl - l)s(n#) + (C 6 - l)s(n# + 1) 



< 



C h C, - 1 / c 



1 vc 



(c s - l)c™s(n#) 



p(n). 
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Using Lemma 16. 11 we infer for any K > 

P(n* = n) < f(D(n) < D(n*)) 

< F(Z(n) 2 < p(n)Z(n*f) 

< F(Z(n) 2 < Kp{n) log^n)' 1 )) + P(Z(n*) 2 > K\og(p{n)- v )) 
1 - Kp(n) \og{p(n)~ l ) + \og{Kp{n) log(/9(n) -1 )) 



< 



exp 



a(n) 

+ V2exp(-Klog(p(n)^ 1 )/4) 

= V2p{n) K ^ + (Kep(n) \og{p{n)~ l )) r{n ^ 2 . 
The choice K = 2r(n) yields the result. □ 



of Theorem \3.4\ To prove that n* is well defined, we infer from the proof of 
Proposition 13.31 that for all n > n* 

P(D(n) < D(n*)) < V2p{n) r{n ^ 2 + {2er{n)p{n)\og{p{ny l )) r ^l 2 . 

This exponential decay implies 

oo 

lim F(3n > m : (n) < D(n*)) < lim V f(D(n) < Din*)) = 0, 



m— >oo 

n=m 



which means that the probability that the criterion D(-) is larger for some 
n > m than at tends to zero as m — > oo, hence n* = argmin ra D(n) is well 
defined with probability one. 

For the main assertion we use Holder's inequality with p~ 1 +q~ 1 = 1, Lemma 
13.21 and Assumption 12.11 for any a > to obtain: 

E[\\x^-x\\ a ] 

oo 

= J2 n\\x {n * +m) -x\n {n *=n* +m} ] 



m=— n#+I 



< J2 n\* {n +m) - *\n 1/p ni q {n , =n # +m} ] 1/q 

m=— n#+l 
oo 

< Yj R ap {x {n * +m) ) a Hn* = n* + m) x l q 

m=— n#+l 

oo 

< (4r(f + l)) 1 ^ Yj R2{x (n * +m) ) a Hn* =n#+ m) l/q 

m=— n#+l 

oo n* — 1 

< (4T(f + l)) 1 /P( 2s (n#)) a ( ^ C s Qm / 2 P(n* = n# + m) 1 ^ + ]T C fe am/2 P(n* 

m=0 m=0 
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We choose <?:=(§ + § min( °fg(^ , "fog^c ) ))/ Q > ^ anc ^ infer from Proposi- 
tion [33] 

CO 

^ C s Qm/2 P(n* = n* + m) 1/g 

m=l 

< <V2 + <*r)")"« (^^)) r/2 'E^ (£f 

which is finite by the choice of </ and the restriction on a. With a symmetric 
argument for m < we conclude that E[||x( n *'' — x|| a ] is bounded by a multiple 
of s(ri&) a l 2 , which has the order of R2{x^ n *^) a . We eventually obtain with 
constants K = K(c s ,Cb,C s ,Cb,r,a), K' = K'(c s ,Cb,C s ,Cb,r,a) (note that p 
and q depend on a and the remaining constants): 

R a {x {n * ] ) < K'R 2 (x^) < KmmR a (x^). 

n>l 

□ 

Acknowledgements 

The first author gratefully acknowledges the financial support by the Upper 
Austrian Technology and Research Promotion. We are grateful for the con- 
structive criticism by two anonymous referees. 

References 

Bakushinskii, A. (1984): "Remarks on choosing a regularization parameter 
using the quasi-optimality and ratio criterion," Comput. Maths. Math. Phys., 
24(4), 181-182. 

Bauer, F. (2007): "Some considerations concerning regularization and param- 
eter choice algorithms," Inverse Problems, 23, 837-858. 

Bauer, F., and S. Pereverzev (2005): "Regularization without preliminary 
knowledge of smoothness and error behavior," European Journal of Applied 
Mathematics, 16(3), 303-317. 

Belomestny, D., and M. Reiss (2006): "Spectral calibration of exponential 
Levy models," Finance Stoch., 10(4), 449-474. 

Cohen, A., M. Hoffmann, and M. Reiss (2004): "Adaptive wavelet 
Galerkin methods for linear inverse problems," SIAM Journal of Numeri- 
cal Analysis, 42(4), 1479-1501. 

Cont, R., and P. Tankov (2004): Financial Modelling with Jump Pro- 
cesses. Chapman & Hall/CRC Financial Mathematics Series. Boca Raton, 
FL: Chapman and Hall/CRC. 



17 



Crepey, S. (2003): "Calibration of the local volatility in a generalized Black- 
Scholes model using Tikhonov regularization.," SIAM J. Math. Anal, 34(5), 
1183-1206. 

Egger, H., T. Hein, and B. Hofmann (2006): "On decoupling of volatility 
smile and term structure in inverse option pricing.," Inverse ProbL, 22(4), 
1247-1259. 

Engl, H., M. Hanke, and A. Neubauer (1996): Regularization of Inverse 
Problems. Kluwer Academic Publisher, Dordrecht, Boston, London. 

Kaipio, J., and E. Somersalo (2005): Statistical and Computational Inverse 
Problems, vol. 160 of Applied Mathematical Sciences. Springer- Verlag, New 
York. 

KOROSTELEV, A. (1987): "On minimax estimation of a discontinuous signal," 
Theory Prob. Appl, 32(4), 727-730. 

Lepski, O. (1990): "On a problem of adaptive estimation in Gaussian white 
noise," Theory of Probability and its Applications, 35(3), 454-466. 

Tikhonov, A., and V. Arsenin (1977): Solutions of Ill-Posed Problems. 
Wiley, New York. 

Tikhonov, A., V. Glasko, and Y. Kriksin (1979): "On the question of 
quasi-optimal choice of a regularized approximation.," Sov. Math. Dokl., 20, 
1036-1040. 

Wahba, G. (1977): "Practical approximate solutions to linear operator equa- 
tions when the data are noisy," SIAM Journal on Numerical Analysis, 14(4), 
651-667. 

Wasserman, L. (2006): All of Nonparametric Statistics. Springer Texts in 
Statistics. New York, NY: Springer. 



18 



This figure "data.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0710.1045vl 



This figure "efficiencies.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0710.1045vl 



This figure "example.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0710.1045vl 



This figure "optimalsol.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/0710.1045vl 



